Candidate Pruning-Based Differentially Private Frequent Itemsets Mining

نویسندگان

  • Yangyang Xu
  • Zhaobin Liu
  • Zhonglian Hu
  • Zhiyang Li
چکیده

Frequent Itemsets Mining(FIM) is a typical data mining task and has gained much attention. Due to the consideration of individual privacy, various studies have been focusing on privacy-preserving FIM problems. Differential privacy has emerged as a promising scheme for protecting individual privacy in data mining against adversaries with arbitrary background knowledge. In this paper, we present an approach to exploring frequent itemsets under rigorous differential privacy model, a recently introduced definition which provides rigorous privacy guarantees in the presence of arbitrary external information. The main idea of differentially privacy FIM is perturbing the support of item which can hide changes caused by absence of any single item. The key observation is that pruning the number of unpromising candidate items can effectively reduce noise added in differential privacy mechanism, which can bring about a better tradeoff between utility and privacy of the result. In order to effectively remove the unpromising items from each candidate set, we use a progressive sampling method to get a super set of frequent items, which is usually much smaller than the original item database. Then the sampled set will be used to shrink candidate set. Extensive experiments on real data sets illustrate that our algorithm can greatly reduce the noise scale injected and output frequent itemsets with high accuracy while satisfying differential privacy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating Closed Frequent Itemset Mining by Elimination of Null Transactions

The mining of frequent itemsets is often challenged by the length of the patterns mined and also by the number of transactions considered for the mining process. Another acute challenge that concerns the performance of any association rule mining algorithm is the presence of „null‟ transactions. This work proposes a closed frequent itemset mining algorithm viz., Closed Frequent Itemset Mining a...

متن کامل

Mining High Utility Pattern in One Phase without Candidate Generation using up Growth+ Algorithm

Utility mining developed to address the limitation of frequent itemset mining by introducing interestingness measures that satisfies both the statistical significance and the user’s expectation. Existing high utility itemsets mining algorithms two steps: first, generate a large number of candidate itemsets and second, identify high utility itemsets from the candidates by an additional scan of t...

متن کامل

Preknowledge-based generalized association rules mining

The subject of this paper is the mining of generalized association rules using pruning techniques. Given a large transaction database and a hierarchical taxonomy tree of the items, we attempt to find the association rules between the items at different levels in the taxonomy tree under the assumption that original frequent itemsets and association rules have already been generated in advance. T...

متن کامل

An Algorithm for Mining Maximum Frequent Itemsets Using Data-sets Condensing and Intersection Pruning

Discovering maximal frequent itemset is a key issue in data mining; the Apriori-like algorithms use candidate itemsets generating/testing method, but this approach is highly time-consuming. To look for an algorithm that can avoid the generating of vast volume of candidate itemsets, nor the generating of frequent pattern tree, DCIP algorithm uses data-set condensing and intersection pruning to f...

متن کامل

Discovering Maximal Frequent Item set using Association Array and Depth First Search Procedure with Effective Pruning Mechanisms

The first step of association rule mining is finding out all frequent itemsets. Generation of reliable association rules are based on all frequent itemsets found in the first step. Obtaining all frequent itemsets in a large database leads the overall performance in the association rule mining. In this paper, an efficient method for discovering the maximal frequent itemsets is proposed. This met...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016